Genetics of smoking and risk of clonal hematopoiesis

Clonal hematopoiesis of indeterminate potential (CHIP) and mosaic chromosomal alterations (mCAs) represent two forms of clonal hematopoiesis where clones bearing expanded somatic mutations have been linked to both oncologic and non-oncologic clinical outcomes including atherosclerosis and all-cause mortality. Epidemiologic studies have highlighted smoking as an important driver of somatic mutations across multiple tissues. However, establishing the causal role of smoking in clonal hematopoiesis has been limited by observational study designs, which may suffer from confounding and reverse-causality. We performed two complementary analyses to investigate the role of smoking in mCAs and CHIP. First, using an observational study design among UK Biobank participants, we confirmed strong associations between smoking and mCAs. Second, using two-sample Mendelian randomization, smoking was strongly associated with mCA but not with CHIP. Overall, these results support a causal association between smoking and mCAs and suggest smoking may variably shape the fitness of clones bearing somatic mutations.


Scientific Reports
| (2022) 12:7248 | https://doi.org/10.1038/s41598-022-09604-z www.nature.com/scientificreports/ may be less susceptible to confounding and reverse causality, enabling estimation of putative causal associations between exposures and outcomes 19 . We aimed to (1) confirm observational associations between smoking and common manifestations of somatic mutation (mCA and CHIP), and (2) estimate putative causal associations between smoking and these outcomes within the MR framework.
Next, we performed two-sample MR using summary statistics to evaluate the causal effects of smoking on somatic mutation outcomes. As a genetic proxy for smoking, we considered up to 119 independent genetic variants associated with smoking at genome-wide significance (p < 5 × 10 -8 ) 16 . The F-statistic for our genetic instrument ranged from 21.78 to 196 (mean 48.5), suggesting the analysis was not limited by weak-instrument bias. In the primary inverse variance-weighted MR analysis, smoking was strongly associated with mCAs (OR 1.44, 95% CI 1.16-1.79, p = 8 × 10 -4 ), and mCA-LOY (OR 1.06, 95% CI 1.04-1.08, p = 1 × 10 -8 ) (Fig. 2). We did not detect a significant association between smoking and CHIP (Transethnic OR 1.01, 95% CI 0.74-1.37, p = 1; European [EUR] OR 0.70, 95% CI 0.48-1.02, p = 0.06), however these confidence intervals do not exclude potentially meaningful effects (Fig. 2). Results were similar using alternative MR methods which each make different www.nature.com/scientificreports/ assumptions about outliers and pleiotropy (Fig. 2). The MR-Egger bias intercept test did not detect evidence of directional pleiotropy (p > 0.05 for all comparisons).
Overall, these results are consistent with smoking as a causal risk factor for mCA. Although smoking was associated with CHIP in our observational analysis, we did not detect an association in our MR analysis; whether smoking represents a causal risk factor for CHIP, including for specific genes, will require further study. Our findings suggest that smoking variably shapes the fitness of distinct somatic mutations, and efforts to reduce smoking would be expected to reduce the burden of the downstream consequences of somatic mutation.
This study has both strengths and limitations. In this case, the MR framework allowed us to leverage the natural randomization in the distribution of genetic variants to estimate the causal associations between smoking and manifestations of somatic mutations. By utilizing large GWAS, we were able to consider thousands of mCA and CHIP cases, which would otherwise require large, extended trials to accrue. Although we were able to identify strong associations between smoking and mCA, whether mCA mediates some of the adverse consequences of smoking will require further study. Similarly, we do not provide specific insights regarding the mechanisms by which smoking influences fitness of clones bearing somatic mutations. Given the low heritability of CHIP, and small sample size of CHIP cases limiting power, we cannot exclude meaningful associations between smoking and CHIP by gene in an MR framework. Smoking has been linked to the fitness of clones harboring mutations in particular driver mutations in the setting of lung cancer 13 , and whether similar findings extend to CHIP remains an important avenue for future study. With the growing availability of population-scale human genotype and sequence data, linking smoking to particular mutational drivers of clonal fitness should become increasingly tractable.
In conclusion, we confirm strong observational associations between smoking and somatic mutation, with MR analyses consistent with smoking as a causal risk factor for mCA. Whether smoking causes CHIP, and the specific mechanisms by which smoking influences fitness will require further study.

Methods
Observational analysis. Whole exome sequencing and array genotyping have been previously described in the UK Biobank, a population-based volunteer biobank recruited 2006-2010 22 . mCA was determined among 479,810 participants without hematologic malignancy who underwent genome-wide genotyping, and CHIP was www.nature.com/scientificreports/ determined among up to 48,966 participants without hematologic malignancy who underwent whole-exome sequencing, as previously described 3,8,17 . We tested for the association between ever smoking (defined by UK Biobank unique data identifier 20116-0.0) and somatic mutation outcomes (mCA, autosomal mCA, mCA-LOY [loss-of-Y chromosome], mCA-LOX [loss-of-X chromosome], lymphoid mCA, myeloid mCA, del14q, and CHIP) using logistic regression adjusted for age, age 2 , sex, sequencing batch, and 15 genetic principal components. Myeloid and lymphoid mCAs were identified based on their association with myeloid and lymphoid malignancies 23 . In a secondary analysis we tested for associations between smoking and CHIP by gene (e.g., DNMT3A, TET2, ASXL1, etc.). In a sensitivity analysis we included self-reported alcohol use as an additional covariate, given strong epidemiologic correlations with smoking. This work was performed using UK Biobank Application #7089. The UK Biobank obtained IRB approval from the North West Multi-centre Research Ethics Committee (approval number: 11/NW/0382), and participants provided informed consent.

Mendelian randomization analysis.
We performed two-sample MR using summary statistics utilizing the TwoSampleMR package in R. As genetic instruments for smoking, we considered independent (r 2 < 0.001, distance > 10,000 kb) genetic variants associated (p < 5 × 10 -8 ) with the lifetime smoking index, a previouslyvalidated continuous measure of lifetime smoking, derived among up to 462,690 UK Biobank participants 16 .
For each genetic variant associated with smoking, we extracted the corresponding effects for each outcome from GWAS of mCA (up to 767,891 unrelated multi-ancestry individuals without hematological cancer from UK Biobank, Biobank Japan, Mass General Brigham Biobank, and FinnGen), mCA-LOY (up to 205,011 male participants of UK Biobank), and CHIP (97,691 participants of TOPMed) 12,17,18 . We calculated F-statistics for each exposure-outcome pair to assess for weak-instrument bias 19 . For the primary analysis we applied the inverse variance-weighted method, but considered weighted median, MR-Egger, and MR-PRESSO in sensitivity analyses, as these methods make different assumptions about the presence of pleiotropy and outliers 24 . Analyses were performed using R 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria). For the primary observational and MR analyses, p values < 0.05 (after accounting for multiple comparisons using Bonferroni adjustment) were considered significant. For secondary analyses, p < 0.05 was considered significant. All methods were carried out in accordance with the relevant guidelines and regulations.